Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobparser for crd TrainingJob #14

Merged
merged 9 commits into from
Apr 12, 2018

Conversation

m3ngyang
Copy link
Collaborator

New parser reuses the most codes of the old one, except it returns an integrated CRD TrainingJob instead of separated ReplicaSet or Job.

corev1.EnvVar{Name: "PADDLE_INIT_NUM_GRADIENT_SERVERS", Value: strconv.Itoa(job.Spec.Trainer.MinInstance)},
corev1.EnvVar{Name: "PADDLE_INIT_NUM_PASSES", Value: strconv.Itoa(job.Spec.Passes)},
corev1.EnvVar{Name: "PADDLE_INIT_USE_GPU", Value: needGPU},
corev1.EnvVar{Name: "LD_LIBRARY_PATH", Value: "/usr/local/cuda/lib64"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LD_LIBRARY_PATH 的值 需要根据 volumeMounts中 nvidia-libraries 的值进行更新

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个如何更新呢?

@m3ngyang
Copy link
Collaborator Author

@helinwang @typhoonzero plz help review the pr.

Copy link
Collaborator

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these files are quite the same to the original ones, can we remove the old files?

func parseToPserver(job *paddlev1.TrainingJob) *v1beta1.ReplicaSet {
replicas := int32(job.Spec.Pserver.MinInstance)
command := make([]string, 2, 2)
// FIXME: refine these part.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME comment should add your id like FIXME(typhoonzero).

corev1.EnvVar{Name: "TRAINERS", Value: strconv.Itoa(job.Spec.Trainer.MinInstance)},
corev1.EnvVar{Name: "PSERVERS", Value: strconv.Itoa(job.Spec.Pserver.MinInstance)},
corev1.EnvVar{Name: "ENTRY", Value: job.Spec.Trainer.Entrypoint},
// FIXME: TOPOLOGY deprecated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above


func (s *TrainingJob) String() string {
b, _ := json.MarshalIndent(s, "", " ")
return string(b[:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think string(b) will suffice.
Btw, just curious why String method is required, I normally use fmt.Sprintf("%+v", foo)
From https://golang.org/pkg/fmt/#hdr-Printing:

%v	the value in a default format
	when printing structs, the plus flag (%+v) adds field names
%#v	a Go-syntax representation of the value
%T	a Go-syntax representation of the type of the value
%%	a literal percent sign; consumes no value

}

// DefaultJobParser implement a basic JobParser.
type DefaultJobParser int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

type DefaultJobParser struct {}

is more extensible.

In this case, using int as the underlying type is a little strange, since it allows int-operators such as parserA + parserB, which has no real meaning. Plus, a := DefaultJobParser(1) is no different from b := DefaultJobParser(2), which is strange.

I think struct{} is a good default if the receiver's methods don't use the value of the receiver. E.g., in this case, method Validate and other methods don't use the value of the receiver p.

type DefaultJobParser int

// Validate updates default values for the added job and validates the fields.
func (p *DefaultJobParser) Validate(job *paddlev1.TrainingJob) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change name to SetDefaultAndValidate since this method not only validates, but also sets default values.

if !job.Spec.FaultTolerant && job.Elastic() {
return errors.New("max-instances should equal to min-instances when fault_tolerant is disabled")
}
// TODO: add validations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add id for todo? E.g., // TODO(helin): ...


// JobParser is a interface can parse given simple TrainingJob struct to
// an integrated TrainingJob
type JobParser interface {
Copy link
Contributor

@helinwang helinwang Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this interface?

My personal rule of thumb is not to use interface unless necessary, here are the common reasons for using interface:

  1. to avoid circular package dependency.
  2. multiple implementations of an interface.
  3. dependency inversion principle: it is about inverting dependency across different packages, so only necessary when multiple packages are involved.

I typically start with one implementation without interface, when there are actually multiple implementations, create an interface. Typically there will be never a need for multiple implementations.

Some related articles: https://medium.com/@cep21/what-accept-interfaces-return-structs-means-in-go-2fe879e25ee8


// JobParser is a interface can parse given simple TrainingJob struct to
// an integrated TrainingJob
type JobParser interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "parser" name is a little confusing, typically parser parses some raw data into a data structure. But judging from the methods, no raw data is involved.

Maybe I have this confusion because ParseToTrainingJob takes a *paddlev1.TrainingJob and return a *paddlev1.TrainingJob, should we make it to take a job.Spec rather than job?

Maybe we can combine Validate(job *paddlev1.TrainingJob) error and ParseToTrainingJob(job *paddlev1.TrainingJob) *paddlev1.TrainingJob into a package level function NewTrainingJob(*Spec) (*paddlev1.TrainingJob, error).

@m3ngyang
Copy link
Collaborator Author

m3ngyang commented Mar 30, 2018

I fixed some problems mentioned above. As for the following questions:

If these files are quite the same to the original ones, can we remove the old files?

I will submit a pr to remove outdated files. @typhoonzero

should we make it to take a job.Spec rather than job?

To make the most of previous structure and codes, I adopt Job and ReplicaSet structure as sub-resource of TrainingJob. But I leave a TODO in the definition of TrainingJob to simplify the structure. @helinwang

Copy link
Collaborator

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -0,0 +1,32 @@
package v1
Copy link
Collaborator

@wangkuiyi wangkuiyi Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package name here should be

package edl // import "github.com/edl/v1"

instead of

package v1

Please take this file from the Go client library of Google API as a reference.

@@ -0,0 +1,32 @@
package v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that in many of our Go repos, there is a directory pkg. I have never seen such a convention in other Go repos -- not in any well-known libraries like Google's API's Go client, not Go lang's repo itself.

This convention would make it hard for other Go projects to import ours. The authors have to write:

import "github.com/paddlepaddle/edl/pkg/apis/paddlepaddle/v1"

Please be aware that there are two paddlepaddle in the above import path. Ths is pretty redundant.

To make this import path concise and clear, we might want

import "github.com/paddlepaddle/edl/v1"

To achive this, we should have training_job.go in the /v1, instead of /pkg/apis/paddlepaddle/v1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg directory is widely used in many go repos like https://github.com/coreos/etcd, https://github.com/kubernetes/kubernetes, and the intention is to hide internal implement. I think we can keep the pkg directory and put cmd/edl to edl is more general. Currently since edl does not have any API provided for users to call, we don't have to expose apis directory, /pkg/apis/paddlepaddle/v1 is only for internal use and call the k8s API.

@@ -0,0 +1,32 @@
package v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow Go's convention and rename this file from trainingjob.go into training_job.go.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -55,6 +55,7 @@ type TrainingJobSpec struct {
Passes int `json:"passes,omitempty"`
Volumes []corev1.Volume `json:"volumes"`
VolumeMounts []corev1.VolumeMount `json:"VolumeMounts"`
//TODO simplify the structure of sub-resource(mengyang)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TODO(mengyang): Simplify the structure of sub-resource.

}
}

// -----------------------------------------------------------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go's comment should follow Godoc's convention: https://blog.golang.org/godoc-documenting-go-code.

apiresource "k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

paddlev1 "github.com/paddlepaddle/edl/pkg/apis/paddlepaddle/v1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import "github.com/paddlepaddle/edl/v1"

}

// setDefaultAndValidate updates default values for the added job and validates the fields.
func setDefaultAndValidate(job *paddlev1.TrainingJob) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddlev1 => edl

And for all the rest namings.

@@ -2,15 +2,16 @@ package updater

import (
"fmt"
"reflect"
"time"

log "github.com/golang/glog"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These import paths are not sorted lexically. This violates the Go code style convention. You might need to configure your favorite editor to call either gofmt or goimports whenever you save the .go files.

Are you using Emacs? If so, a very convenient configuration is https://github.com/helinwang/go-emacs.

@@ -2,15 +2,16 @@ package updater

import (
"fmt"
"reflect"
"time"

log "github.com/golang/glog"
padv1 "github.com/paddlepaddle/edl/pkg/apis/paddlepaddle/v1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

padv1 "github.com/paddlepaddle/edl/pkg/apis/paddlepaddle/v1"

=>

"github.com/paddlepaddle/edl/v1"

@m3ngyang
Copy link
Collaborator Author

m3ngyang commented Apr 4, 2018

thx for your review @wangkuiyi and I have fixed some code style convention problem you mentioned above, and I agree with @typhoonzero so I keep the current directory structs.

// parseToTrainer parse TrainingJob to a kubernetes job resource.
func parseToTrainer(job *paddlev1.TrainingJob) *batchv1.Job {
replicas := int32(job.Spec.Trainer.MinInstance)
command := make([]string, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make([]string, 2) creates a slice which is never used. var command []string is suffice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@m3ngyang m3ngyang merged commit 57937d1 into elasticdeeplearning:develop_CRD Apr 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants